Using Validation to Avoid Overfitting in Boosting Using Validation to Avoid Overfitting in Boosting
نویسندگان
چکیده
AdaBoost is a well known, effective technique for increasing the accuracy of learning algorithms. However, it has the potential to overfit the training set because it focuses on misclassified examples, which may be noisy. We demonstrate that overfitting in AdaBoost can be alleviated in a time-efficient manner using a combination of dagging and validation sets. The training set is partitioned into subsets. Each subset is trained with AdaBoost generating multiple hypotheses. The hypotheses are then applied to the validation set, which is made up of the entire training set. The validation set adjusts the weights of the hypotheses. The hypotheses generated by all subsets are then aggregated with a weighted plurality vote for final classification. We show our algorithm has similar performance on standard datasets and improved performance when classification noise is added. We also apply validation sets to another subset training algorithm, the BB algorithm.
منابع مشابه
Functional Frank-Wolfe Boosting for General Loss Functions
Boosting is a generic learning method for classification and regression. Yet, as the number of base hypotheses becomes larger, boosting can lead to a deterioration of test performance. Overfitting is an important and ubiquitous phenomenon, especially in regression settings. To avoid overfitting, we consider using l1 regularization. We propose a novel Frank-Wolfe type boosting algorithm (FWBoost...
متن کاملAvoiding Boosting Overfitting by Removing Confusing Samples
Boosting methods are known to exhibit noticeable overfitting on some datasets, while being immune to overfitting on other ones. In this paper we show that standard boosting algorithms are not appropriate in case of overlapping classes. This inadequateness is likely to be the major source of boosting overfitting while working with real world data. To verify our conclusion we use the fact that an...
متن کاملProbing for Sparse and Fast Variable Selection with Model-Based Boosting
We present a new variable selection method based on model-based gradient boosting and randomly permuted variables. Model-based boosting is a tool to fit a statistical model while performing variable selection at the same time. A drawback of the fitting lies in the need of multiple model fits on slightly altered data (e.g., cross-validation or bootstrap) to find the optimal number of boosting it...
متن کاملA Fast Scheme for Feature Subset Selection to Avoid Overfitting in AdaBoost
AdaBoost is a well known, effective technique for increasing the accuracy of learning algorithms. However, it has the potential to overfit the training set because its objective is to minimize error on the training set. We show that with the introduction of a scoring function and the random selection of training data it is possible to create a smaller set of feature vectors. The selection of th...
متن کاملUsing Validation Sets to Avoid Overfitting in AdaBoost
AdaBoost is a well known, effective technique for increasing the accuracy of learning algorithms. However, it has the potential to overfit the training set because its objective is to minimize error on the training set. We demonstrate that overfitting in AdaBoost can be alleviated in a time-efficient manner using a combination of dagging and validation sets. Half of the training set is removed ...
متن کامل